Data reduction through early grouping

نویسندگان

  • Weipeng P. Yan
  • Per-Åke Larson
چکیده

SQL queries containing GROUP BY and aggre-gation occur frequently in decision support applications. Grouping with aggregation is typically done by rst sorting the input and then performing the aggregation as part of the output phase of the sort. The most widely used external sorting algorithm is merge sort, consisting of a run formation phase followed by a (single) merge pass. The amount of data output from the run formation phase can be reduced by a technique that we call early grouping. The idea is straightforward: simply form groups and perform aggregation during run formation. Each run will now consist of partial groups instead of individual records. These partial groups are then combined during the merge phase. Early grouping always reduces the number of records output from the run formation phase. The relative output size depends on the amount of memory relative to the total number of groups and the distribution of records over groups. When the input data is uniformly distributed | the worst case | our simulation results show that the relative output size is proportional to the (relative) amount of memory used. When the data is skewed | the more common case in practice | the relative output size is much smaller.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Grouping and Duplicate Elimination: Beneets of Early Aggregation

Early aggregation is a technique for speeding up the processing of GROUP BY queries by reducing the amount of intermediate data transferred between main memory and disk. It can also be applied to duplicate elimination because duplicate elimination is equivalent to grouping with no aggregation functions. This paper describes six diierent algorithms for grouping and aggregation, shows how to inco...

متن کامل

Incorporating local image structure in normalized cut based graph partitioning for grouping of pixels

Keywords: Perceptual grouping Early human vision Image pixel grouping Local image structure Graph partitioning Normalized cut a b s t r a c t Graph partitioning for grouping of image pixels has been explored a lot, with normalized cut based graph partitioning being one of the popular ones. In order to have a credible allegiance to the perceptual grouping taking place in early human vision, we p...

متن کامل

Multi-level Grouping Genetic Algorithm for Low Carbon Virtual Private Clouds

Optimization problem of physical servers consolidation is very important for energy efficiency and cost reduction of data centers. For this type of problems, which can be considered as bin-packing problems, traditional heuristic algorithms such as Genetic Algorithm (GA) are not suitable. Therefore, other heuristic algorithms are proposed instead, such as Grouping Genetic Algorithm (GGA), which ...

متن کامل

EFFECTIVE GROUPING FOR ENERGY AND PERFORMANCE: CONSTRUCTION OF ADAPTIVE, SUSTAINABLE, AND MAINTAINABLE DATA STORAGE by

EFFECTIVE GROUPING FOR ENERGY AND PERFORMANCE: CONSTRUCTION OF ADAPTIVE, SUSTAINABLE, AND MAINTAINABLE DATA STORAGE David S. Essary, PhD University of Pittsburgh, 2011 The performance gap between processors and storage systems has been increasingly critical over the years. Yet the performance disparity remains, and further, storage energy consumption is rapidly becoming a new critical problem. ...

متن کامل

Impact of Grouping Type in Descriptve Collaborative Writings on Iranian EFL Learners' Written Grammatical Accuracy

The current study was an attempt to investigate the impact of grouping type on the grammatical accuracy of Iranian EFL learners in collaborative writing. Through administering the Michigan Test of English Language Proficiency, 64 female university students available participated in this study and were assigned to two groups--heterogeneous and homogeneous. The treatment process lasted 12 weeks o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994